This markdown is for analyzing the Allen Brain Atlas MTG (middle temporal gyrus) dataset. Some details of their analysis is provided here. Briefly, these data were generated using SMART-Seq v4 Ultra Low Input RNA Kit, which is an improved version of SMART-seq2 protocol.
This pipeline using some alternative strategies for data processing and analysis, mostly based on bioconductor workflows for scRNAseq.
The packages required for the analysis are as follows: - scater: collection of tools for doing quality control analyses of scRNA-seq - scran: methods provide normalization of cell-specific biases, correcting batch effects, identify marker genes - SC3: package for single cell consensus clustering.
bio_packs = c("SingleCellExperiment","biomaRt","scater","scran","SC3")
source("https://bioconductor.org/biocLite.R")
# biocLite("BiocUpgrade",suppressUpdates = TRUE)
for(pack1 in bio_packs){
if( !pack1 %in% installed.packages()[,"Package"]){
biocLite(pack1, suppressUpdates = TRUE)
}
}
cran_packs = c("data.table", "svd", "Rtsne")
for(pack1 in cran_packs){
if( !pack1 %in% installed.packages()[,"Package"]){
install.packages(pack1)
}
}
library(SingleCellExperiment)
library(scater)
library(scran)
library(limma)
library(data.table)
library(svd)
library(Rtsne)
library(SC3)Before running this R markdown, please first download the datafile human_MTG_gene_expression_matrices_2018-06-14.zip from http://celltypes.brain-map.org/api/v2/well_known_file_download/694416044 to your local computer, and unzip the file. Then in the following code chunk, set the working directory to be the directory of the unzipped folder.
You can modify the following code to determine if the analysis will be conducted with exon counts only(exon_only = TRUE) or with both exon and intron counts summed together (exon_only = FALSE). We have requested 7 cores (num_cores = 7) to improve SC3 runtime.
# work_dir = "~/research/scRNAseq/data/Allen_BI/"
# work_dir = paste0(work_dir, "human_MTG_gene_expression_matrices_2018-06-14")
# repo_dir = "~/research/GitHub/scRNAseq_pipelines"
repo_dir = "/pine/scr/p/l/pllittle/CS_eQTL/s3_Real/scRNAseq_pipelines"
work_dir = file.path(repo_dir,"MTG")
source(file.path(repo_dir,"SOURCE.R"))
exon_only = TRUE
run_sc3 = !TRUE
num_cores = ifelse(run_sc3,7,1)
exons_fn = "human_MTG_2018-06-14_exon-matrix.csv"
introns_fn = "human_MTG_2018-06-14_intron-matrix.csv"The following code read-in count data. To be compatible with other studies that only use count data from exonic regions, here we just read the exon counts.
counts = data.table::fread(file.path(work_dir,exons_fn),data.table = FALSE)
dim(counts)## [1] 50281 15929
counts[1:2,1:2]## V1 F1S4_160106_001_B01
## 1 353007 0
## 2 353008 0
rownames(counts) = counts$V1
counts = as.matrix(counts[,-1])
if( !exon_only ) {
intron_counts = fread(file.path(work_dir, introns_fn), data.table=FALSE)
rownames(intron_counts) = intron_counts$V1
intron_counts = as.matrix(intron_counts[,-1])
counts = counts + intron_counts
}Next we add the sample/cell information and gene information. Spike-ins were used for this data, though in this final count matrix, the spike-ins were not included. We generate an object of SingleCellExperiment named as sce, and then delete those original data.
cell_data = data.table::fread(file.path(work_dir,"human_MTG_2018-06-14_samples-columns.csv"))
dim(cell_data)## [1] 15928 34
cell_data[1:2,]## sample_name sample_id sample_type organism donor sex
## 1: F1S4_160106_001_B01 556012415 Nuclei Homo sapiens H200.1030 M
## 2: F1S4_160106_001_C01 556012410 Nuclei Homo sapiens H200.1030 M
## age_days brain_hemisphere brain_region brain_subregion facs_date
## 1: 19710 L MTG L5 1/6/2016
## 2: 19710 L MTG L5 1/6/2016
## facs_container facs_sort_criteria rna_amplification_set
## 1: F1S4_160106_001 NeuN-positive A8S4_160401_02
## 2: F1S4_160106_001 NeuN-positive A8S4_160401_02
## library_prep_set library_prep_avg_size_bp seq_name seq_tube
## 1: L8S4_160406_02 429 LS-15051_S02_E1-50 LS-15051
## 2: L8S4_160406_02 448 LS-15051_S03_E1-50 LS-15051
## seq_batch total_reads percent_exon_reads percent_intron_reads
## 1: R8S4-160411-H 2572946 45.08065 43.05855
## 2: R8S4-160411-H 2755839 34.91697 50.92145
## percent_intergenic_reads percent_rrna_reads percent_mt_exon_reads
## 1: 11.86080 0 0.09005241
## 2: 14.16158 0 0.05247041
## percent_reads_unique percent_synth_reads percent_ecoli_reads
## 1: 85.57370 0.007534554 0.018536728
## 2: 87.61978 0.002790439 0.007264575
## percent_aligned_reads_total complexity_cg genes_detected_cpm_criterion
## 1: 92.41888 0.3166372 8635
## 2: 94.00132 0.2879887 11697
## genes_detected_fpkm_criterion class cluster
## 1: 5253 GABAergic Inh L4-6 SST B3GAT2
## 2: 8246 Glutamatergic Exc L5-6 RORB TTC12
table(cell_data$class)##
## GABAergic Glutamatergic no class Non-neuronal
## 4164 10525 325 914
table(cell_data$cluster)##
## Astro L1-2 FGFR3 GFAP Astro L1-6 FGFR3 SLC14A1 Endo L2-6 NOSTRIN
## 61 230 9
## Exc L2 LAMP5 LTK Exc L2-3 LINC00507 FREM3 Exc L2-4 LINC00507 GLP2R
## 812 2284 170
## Exc L3-4 RORB CARM1P1 Exc L3-5 RORB COL22A1 Exc L3-5 RORB ESR1
## 280 160 1428
## Exc L3-5 RORB FILIP1L Exc L3-5 RORB TWIST2 Exc L4-5 FEZF2 SCN4B
## 153 93 25
## Exc L4-5 RORB DAPK2 Exc L4-5 RORB FOLH1B Exc L4-6 FEZF2 IL26
## 173 870 344
## Exc L4-6 RORB C1R Exc L4-6 RORB SEMA3E Exc L5-6 FEZF2 ABO
## 160 777 373
## Exc L5-6 FEZF2 EFTUD1P1 Exc L5-6 RORB TTC12 Exc L5-6 SLC17A7 IL15
## 314 167 56
## Exc L5-6 THEMIS C1QL3 Exc L5-6 THEMIS CRABP1 Exc L5-6 THEMIS DCSTAMP
## 1537 147 53
## Exc L5-6 THEMIS FGF10 Exc L6 FEZF2 OR2T8 Exc L6 FEZF2 SCUBE1
## 78 19 52
## Inh L1 SST CHRNA4 Inh L1 SST NMBR Inh L1-2 GAD1 MC4R
## 52 283 107
## Inh L1-2 LAMP5 DBP Inh L1-2 PAX6 CDH12 Inh L1-2 PAX6 TNFAIP8L3
## 21 90 16
## Inh L1-2 SST BAGE2 Inh L1-2 VIP LBH Inh L1-2 VIP PCDH20
## 108 47 61
## Inh L1-2 VIP TSPAN12 Inh L1-3 PAX6 SYT6 Inh L1-3 SST CALB1
## 42 29 279
## Inh L1-3 VIP ADAMTSL1 Inh L1-3 VIP CCDC184 Inh L1-3 VIP CHRM2
## 72 64 175
## Inh L1-3 VIP GGH Inh L1-4 LAMP5 LCP2 Inh L1-4 VIP CHRNA6
## 68 356 25
## Inh L1-4 VIP OPRM1 Inh L1-4 VIP PENK Inh L2-3 VIP CASC6
## 52 17 45
## Inh L2-4 PVALB WFDC2 Inh L2-4 SST FRZB Inh L2-4 VIP CBLN1
## 387 64 67
## Inh L2-4 VIP SPAG17 Inh L2-5 PVALB SCUBE3 Inh L2-5 VIP SERPINF1
## 33 32 55
## Inh L2-5 VIP TYR Inh L2-6 LAMP5 CA1 Inh L2-6 VIP QPCT
## 62 256 37
## Inh L3-5 SST ADGRG6 Inh L3-6 SST HPGD Inh L3-6 SST NPY
## 132 60 15
## Inh L3-6 VIP HS3ST3A1 Inh L4-5 PVALB MEPE Inh L4-5 SST STK32A
## 80 64 93
## Inh L4-6 PVALB SULF1 Inh L4-6 SST B3GAT2 Inh L4-6 SST GXYLT2
## 167 182 41
## Inh L5-6 GAD1 GLP1R Inh L5-6 PVALB LGR5 Inh L5-6 SST KLHDC8A
## 27 52 63
## Inh L5-6 SST MIR548F2 Inh L5-6 SST NPM1P10 Inh L5-6 SST TH
## 80 79 27
## Micro L1-3 TYROBP no class Oligo L1-6 OPALIN
## 63 325 313
## OPC L1-6 PDGFRA
## 238
cell_data$cell_type = sapply(strsplit(cell_data$cluster, split=" "), "[", 1)
table(cell_data$cell_type)##
## Astro Endo Exc Inh Micro no Oligo OPC
## 291 9 10525 4164 63 325 313 238
cell_data$cell_type[which(cell_data$cell_type == "no")] = "unknown"
all(colnames(counts) == cell_data$sample_name)## [1] TRUE
sce = SingleCellExperiment(assays = list(counts = counts), colData = cell_data)
rm(counts, cell_data)Import gene infomration and add them into the sce object. We also check for spike ins. Those gene names starting with ERCC are indeed gene names rather than labels for spike-ins.
gene_dat = smart_RT(file.path(work_dir,"human_MTG_2018-06-14_genes-rows.csv"),
sep=',', header=TRUE)
dim(gene_dat)## [1] 50281 5
gene_dat[1:3,]## gene chromosome entrez_id
## 1 3.8-1.2 6 353007
## 2 3.8-1.3 6 353008
## 3 3.8-1.4 6 353009
## gene_name mouse_homologenes
## 1 HLA complex group 26 (non-protein coding) pseudogene
## 2 HLA complex group 26 (non-protein coding) pseudogene
## 3 HLA complex group 26 (non-protein coding) pseudogene
all(rownames(sce) == as.character(gene_dat$entrez_id))## [1] TRUE
rowData(sce) = gene_dat
rownames(sce) = rowData(sce)$gene
sce## class: SingleCellExperiment
## dim: 50281 15928
## metadata(0):
## assays(1): counts
## rownames(50281): 3.8-1.2 3.8-1.3 ... bA255A11.4 bA395L14.12
## rowData names(5): gene chromosome entrez_id gene_name
## mouse_homologenes
## colnames(15928): F1S4_160106_001_B01 F1S4_160106_001_C01 ...
## F2S4_170405_060_F01 F2S4_170405_060_H01
## colData names(35): sample_name sample_id ... cluster cell_type
## reducedDimNames(0):
## spikeNames(0):
rm(gene_dat)
rowData(sce)[grep("^ERCC", rowData(sce)$gene),]## DataFrame with 10 rows and 5 columns
## gene chromosome entrez_id
## <character> <character> <integer>
## ERCC1 ERCC1 19 2067
## ERCC2 ERCC2 19 2068
## ERCC3 ERCC3 2 2071
## ERCC4 ERCC4 16 2072
## ERCC5 ERCC5 13 2073
## ERCC6 ERCC6 10 2074
## ERCC6-PGBD3 ERCC6-PGBD3 10 101243544
## ERCC6L ERCC6L X 54821
## ERCC6L2 ERCC6L2 9 375748
## ERCC8 ERCC8 5 1161
## gene_name
## <character>
## ERCC1 excision repair cross-complementation group 1
## ERCC2 excision repair cross-complementation group 2
## ERCC3 excision repair cross-complementation group 3
## ERCC4 excision repair cross-complementation group 4
## ERCC5 excision repair cross-complementation group 5
## ERCC6 excision repair cross-complementation group 6
## ERCC6-PGBD3 ERCC6-PGBD3 readthrough
## ERCC6L excision repair cross-complementation group 6-like
## ERCC6L2 excision repair cross-complementation group 6-like 2
## ERCC8 excision repair cross-complementation group 8
## mouse_homologenes
## <character>
## ERCC1 Ercc1
## ERCC2 Ercc2
## ERCC3 Ercc3
## ERCC4 Ercc4
## ERCC5 Ercc5
## ERCC6 Ercc6
## ERCC6-PGBD3
## ERCC6L Ercc6l
## ERCC6L2 Ercc6l2
## ERCC8 Ercc8
sce## class: SingleCellExperiment
## dim: 50281 15928
## metadata(0):
## assays(1): counts
## rownames(50281): 3.8-1.2 3.8-1.3 ... bA255A11.4 bA395L14.12
## rowData names(5): gene chromosome entrez_id gene_name
## mouse_homologenes
## colnames(15928): F1S4_160106_001_B01 F1S4_160106_001_C01 ...
## F2S4_170405_060_F01 F2S4_170405_060_H01
## colData names(35): sample_name sample_id ... cluster cell_type
## reducedDimNames(0):
## spikeNames(0):
Next step we apply QC based on a set of features per cell. The code and filtering below are motivated by the vignette presented here.
sort(names(colData(sce)))## [1] "age_days" "brain_hemisphere"
## [3] "brain_region" "brain_subregion"
## [5] "cell_type" "class"
## [7] "cluster" "complexity_cg"
## [9] "donor" "facs_container"
## [11] "facs_date" "facs_sort_criteria"
## [13] "genes_detected_cpm_criterion" "genes_detected_fpkm_criterion"
## [15] "library_prep_avg_size_bp" "library_prep_set"
## [17] "organism" "percent_aligned_reads_total"
## [19] "percent_ecoli_reads" "percent_exon_reads"
## [21] "percent_intergenic_reads" "percent_intron_reads"
## [23] "percent_mt_exon_reads" "percent_reads_unique"
## [25] "percent_rrna_reads" "percent_synth_reads"
## [27] "rna_amplification_set" "sample_id"
## [29] "sample_name" "sample_type"
## [31] "seq_batch" "seq_name"
## [33] "seq_tube" "sex"
## [35] "total_reads"
sort(names(rowData(sce)))## [1] "chromosome" "entrez_id" "gene"
## [4] "gene_name" "mouse_homologenes"
sce = calculateQCMetrics(sce)
sort(names(colData(sce)))## [1] "age_days" "brain_hemisphere"
## [3] "brain_region" "brain_subregion"
## [5] "cell_type" "class"
## [7] "cluster" "complexity_cg"
## [9] "donor" "facs_container"
## [11] "facs_date" "facs_sort_criteria"
## [13] "genes_detected_cpm_criterion" "genes_detected_fpkm_criterion"
## [15] "is_cell_control" "library_prep_avg_size_bp"
## [17] "library_prep_set" "log10_total_counts"
## [19] "log10_total_features_by_counts" "organism"
## [21] "pct_counts_in_top_100_features" "pct_counts_in_top_200_features"
## [23] "pct_counts_in_top_50_features" "pct_counts_in_top_500_features"
## [25] "percent_aligned_reads_total" "percent_ecoli_reads"
## [27] "percent_exon_reads" "percent_intergenic_reads"
## [29] "percent_intron_reads" "percent_mt_exon_reads"
## [31] "percent_reads_unique" "percent_rrna_reads"
## [33] "percent_synth_reads" "rna_amplification_set"
## [35] "sample_id" "sample_name"
## [37] "sample_type" "seq_batch"
## [39] "seq_name" "seq_tube"
## [41] "sex" "total_counts"
## [43] "total_features_by_counts" "total_reads"
sort(names(rowData(sce)))## [1] "chromosome" "entrez_id"
## [3] "gene" "gene_name"
## [5] "is_feature_control" "log10_mean_counts"
## [7] "log10_total_counts" "mean_counts"
## [9] "mouse_homologenes" "n_cells_by_counts"
## [11] "pct_dropout_by_counts" "total_counts"
par(mfrow=c(3,2),mar=c(5, 4, 1, 1), bty="n",cex=0.9)
smart_hist(log10(sce$total_counts),xlab="log10(Library sizes)",
main="",breaks=40,ylab="Number of cells")
smart_hist(log10(sce$total_features_by_counts),xlab="log10(# of expressed genes)",
main="",breaks=40,ylab="Number of cells")
smart_hist(sce$percent_rrna_reads, xlab="Ribosome prop. (%)",
ylab="Number of cells", breaks=40,main="")
smart_hist(sce$percent_mt_exon_reads, xlab="Mitochondrial prop. (%)",
ylab="Number of cells", breaks=80,main="")
smart_hist(sce$percent_synth_reads, xlab="Synthetic reads (%)",
ylab="Number of cells", breaks=40,main="")
smart_hist(sce$percent_reads_unique, xlab="Unique reads (%)",
ylab="Number of cells", breaks=80,main="")par(mfrow=c(3,2),mar=c(5, 4, 1, 1),bty="n",cex=0.9)
plot(log10(sce$total_counts),log10(sce$total_features_by_counts),
xlab="log10(Library sizes)",ylab="log10(# of expressed genes)",
pch=20,cex=0.5,col=rgb(0,0,0,0.5))
plot(log10(sce$total_counts),sce$percent_synth_reads,
xlab="log10(Library sizes)",ylab="synth reads percent (%)",
pch=20,cex=0.5,col=rgb(0,0,0,0.5))
plot(log10(sce$total_counts),sce$percent_reads_unique,
xlab="log10(Library sizes)", ylab="unique reads percent (%)",
pch=20,cex=0.5,col=rgb(0,0,0,0.5))
plot(sce$percent_exon_reads,sce$percent_reads_unique,
xlab="exon reads percent (%)", ylab="unique reads percent (%)",
pch=20,cex=0.5,col=rgb(0,0,0,0.5))
plot(sce$percent_aligned_reads_total,sce$percent_reads_unique,
xlab="aligned reads percent (%)",ylab="unique reads percent (%)",
pch=20,cex=0.5,col=rgb(0,0,0,0.5))
plot(sce$genes_detected_fpkm_criterion,sce$percent_reads_unique,
xlab="detected genes (%)",ylab="unique reads percent (%)",
pch=20,cex=0.5,col=rgb(0,0,0,0.5))# Removing outliers defined as being percent_reads_unique lower than 50%
keep = sce$percent_reads_unique > 50
table(keep)## keep
## FALSE TRUE
## 70 15858
table(colData(sce)$cell_type, keep)## keep
## FALSE TRUE
## Astro 3 288
## Endo 0 9
## Exc 52 10473
## Inh 13 4151
## Micro 0 63
## Oligo 0 313
## OPC 0 238
## unknown 2 323
We then subset the sce object to keep high quality samples(cells).
sce = sce[,keep]
dim(sce)## [1] 50281 15858
We keep those genes that are expressed in at least 30 cells, which is roughly 0.2% of the cells. This match to the goal of this study, to detect cell types as rare as 0.2% of all the cells.
rowData(sce)[1:2,]## DataFrame with 2 rows and 12 columns
## gene chromosome entrez_id
## <character> <character> <integer>
## 3.8-1.2 3.8-1.2 6 353007
## 3.8-1.3 3.8-1.3 6 353008
## gene_name
## <character>
## 3.8-1.2 HLA complex group 26 (non-protein coding) pseudogene
## 3.8-1.3 HLA complex group 26 (non-protein coding) pseudogene
## mouse_homologenes is_feature_control mean_counts
## <character> <logical> <numeric>
## 3.8-1.2 FALSE 0.00878955298844802
## 3.8-1.3 FALSE 0.0573204419889503
## log10_mean_counts n_cells_by_counts pct_dropout_by_counts
## <numeric> <integer> <numeric>
## 3.8-1.2 0.00380057604028069 7 99.9560522350578
## 3.8-1.3 0.024206628837133 21 99.8681567051733
## total_counts log10_total_counts
## <integer> <numeric>
## 3.8-1.2 140 2.14921911265538
## 3.8-1.3 913 2.96094619573383
summary(rowData(sce)$mean_counts)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.02 0.31 15.13 5.56 101644.02
table(rowData(sce)$mean_counts == 0)##
## FALSE TRUE
## 48278 2003
summary(rowData(sce)$n_cells_by_counts)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 30 244 1744 1893 15928
table(colData(sce)$cell_type)##
## Astro Endo Exc Inh Micro Oligo OPC unknown
## 288 9 10473 4151 63 313 238 323
par(mfrow=c(2,2), mar=c(5,4,1,1))
smart_hist(log10(rowData(sce)$mean_counts+1e-6),main="",
breaks=40, xlab="log10(ave # of UMI + 1e-6)")
smart_hist(log10(rowData(sce)$n_cells_by_counts+1),main="",
breaks=40, xlab="log10(# of expressed cells + 1)")
smoothScatter(log10(rowData(sce)$mean_counts+1e-6),
log10(rowData(sce)$n_cells_by_counts + 1),
xlab="log10(ave # of UMI + 1e-6)",
ylab="log10(# of expressed cells + 1)")
tb1 = table(rowData(sce)$n_cells_by_counts)
tb1[1:11]##
## 0 1 2 3 4 5 6 7 8 9 10
## 2003 567 555 559 662 574 539 562 494 441 445
ncol(sce)*0.002## [1] 31.716
min_detect_min_sample = rowSums(counts(sce) > 0) > 30
table(min_detect_min_sample)## min_detect_min_sample
## FALSE TRUE
## 12624 37657
min_mean_counts05 = rowData(sce)$mean_counts > 0.5
table(min_mean_counts05)## min_mean_counts05
## FALSE TRUE
## 27404 22877
table(min_detect_min_sample,min_mean_counts05)## min_mean_counts05
## min_detect_min_sample FALSE TRUE
## FALSE 12624 0
## TRUE 14780 22877
sce = sce[which(min_detect_min_sample),]
dim(sce)## [1] 37657 15858
# Next we check those highly expressed genes
par(mfrow=c(1,2),mar=c(5,8,1,1))od1 = order(rowData(sce)$mean_counts, decreasing = TRUE)
barplot(rowData(sce)$mean_counts[od1[20:1]], las=1,
names.arg=rowData(sce)$gene[od1[20:1]],
horiz=TRUE, cex.names=1, cex.axis=0.7,
xlab="ave # of UMI")
barplot(log10(rowData(sce)$mean_counts[od1[20:1]]), las=1,
names.arg=rowData(sce)$gene[od1[20:1]],
horiz=TRUE, cex.names=1, cex.axis=0.7,
xlab="log10(ave # of UMI)")saveRDS(sce,file.path(work_dir,"post_gene_filter.rds"))
gc()## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 5667496 302.7 10052351 536.9 8403104 448.8
## Vcells 350111536 2671.2 1199268152 9149.7 1250756922 9542.6
# To load image
# sce = readRDS(file.path(work_dir,"post_gene_filter.rds"))A simple solution for normalization and stablizing expression variance across genes is to tranform the count data by log(count/size.factor + 1). One may calcualte size.factor per cell as the total number of reads/UMIs, and this assumes the total expression are the same across all the cells. However, the total expression of each cell may vary with respect to cell type and/or cell size, and the computeSumFactors function in R package scran provides a more sophisicated way to calcualte size.factor to allow such variation across cells (Lun, Bach, and Marioni 2016). computeSumFactors can use initial clustering of cells to normalize expression within and beetween clusters. Within a cluster, it estimates the size factor for many groups of cells so that there are more groups than cells, and then it can calcualte the size factor per cell using a linear deconvolution system.
This method will fail (i.e., giving negative size factors) if there are too many zeros in the count data. Therefore it is necessesary to remove “genes with a library size-adjusted average count below a specified threshold” using the parameter min.mean. See here for more explanations.
min_mean = 1
date()## [1] "Sun Feb 17 21:51:55 2019"
clusters = quickCluster(sce, min.mean=min_mean, method="igraph")
table(clusters)## clusters
## 1 2 3 4 5 6 7 8 9
## 1422 254 236 2335 791 351 3481 4356 2632
date()## [1] "Sun Feb 17 21:55:52 2019"
sce = computeSumFactors(sce, cluster=clusters, min.mean=min_mean)
date()## [1] "Sun Feb 17 22:02:55 2019"
summary(sizeFactors(sce))## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05659 0.81609 0.99811 1.00000 1.16893 7.05769
As shown in the following plot, the final size factor estimation is indeed highly correlated with the naive definition by total count.
par(mfrow=c(1,1), mar=c(5,4,2,1), bty="n")
plot(sce$total_counts, sizeFactors(sce), log="xy",
xlab="total counts", ylab="size factors",
cex=0.3, pch=20, col=rgb(0.1,0.2,0.7,0.3))Finally, the command normalize(sce) adds the normalized expression into the variable sce.
sce = normalize(sce)For dimension reduction, such as calculating PCA or performing TSNE, we should start by identifying a subset of genes with high level of biological signal relative to background (technical) noise. We start by examinng mean-variance relationship.
Since the information of individual spike-ins have been removed from this MTG dataset, we implemented the code below similar to here.
date()## [1] "Sun Feb 17 22:03:13 2019"
new_trend = makeTechTrend(x=sce)
date()## [1] "Sun Feb 17 22:09:51 2019"
fit = trendVar(sce, use.spikes=FALSE, loess.args=list(span=0.05))
date()## [1] "Sun Feb 17 22:10:18 2019"
par(mfrow=c(1,1), mar=c(5,4,2,1), bty="n")
plot(fit$mean, fit$vars, pch=20, col=rgb(0.1,0.2,0.7,0.6),
xlab="log(mean)", ylab="var")
curve(new_trend(x), col="red", lwd=2, add=TRUE)
curve(fit$trend(x), col="orange", lwd=2, add=TRUE)
legend("topright", legend=c("Poisson noise", "observed trend"),
lty=1, lwd=2, col=c("red", "orange"), bty="n")The above function makeTechTrend() assumes a Poisson model and from the above plot, it is clearly not a suitable fit. So we will keep fit equal to the loess fit from trendVar() rather than setting it equal to the output from makeTechTrend().
In the following code, we used the decomposeVar function from R/cran to estimate the technical/biological component of variance of each gene.
The technical component of the variance for each gene is determined by interpolating the fitted trend in fit at the mean log-count for that gene. This represents variance due to sequencing noise, variability in capture efficiency, etc. The biological component is determined by subtracting the technical component from the total variance. Highly variable genes (HVGs) can be identified as those with large biological components
dec = decomposeVar(fit=fit)
top_dec = dec[order(dec$bio,decreasing=TRUE),]
top_dec[1:10,]## DataFrame with 10 rows and 6 columns
## mean total bio
## <numeric> <numeric> <numeric>
## ENC1 6.09953103432473 17.6135525047028 8.22185678948232
## GAD1 2.61037642718004 17.482161519494 7.93040403516013
## SPARCL1 7.17861942532271 14.9308385190744 7.22008150620191
## SLC6A1 2.97727719686037 17.291562981475 7.11731234153986
## CHN1 7.91950610450711 13.7011554174148 6.95860626939825
## CXCL14 1.75210540159507 14.2949412930129 6.89414271521293
## ARPP21 6.91755936616036 14.833062933671 6.75377181556627
## CCK 5.70554628157203 16.4903035771684 6.54112112865473
## SYNPR 2.90912717097025 16.4483987038949 6.39464948644521
## KCNIP4-IT1 5.62880780753245 16.3288612093542 6.27955725014184
## tech p.value FDR
## <numeric> <numeric> <numeric>
## ENC1 9.39169571522048 0 0
## GAD1 9.55175748433385 0 0
## SPARCL1 7.71075701287247 0 0
## SLC6A1 10.1742506399352 0 0
## CHN1 6.74254914801657 0 0
## CXCL14 7.40079857779996 0 0
## ARPP21 8.07929111810478 0 0
## CCK 9.94918244851365 0 0
## SYNPR 10.0537492174497 0 0
## KCNIP4-IT1 10.0493039592124 0 0
par(mfrow=c(2,2))
smart_hist(dec$bio,breaks=30,xlab="Biological Variance")
smart_hist(dec$FDR,breaks=30,xlab="FDR")
smart_hist(log10(dec$FDR + 1e-6),breaks=30,xlab="log10(FDR + 1e-6)")
par(mfrow=c(1,1))wtop = match(rownames(top_dec)[1:10], rownames(sce))
sce_sub = sce[wtop,]
dim(sce_sub)## [1] 10 15858
plotExpression(sce_sub, features=1:10)Here we only select approximately the top a few thousands highly variable genes (HVGs) based on tuning thresholds on dec$bio and dec$FDR from the earlier variance decomposition.
summary(dec$bio)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -10.012105 -0.073947 -0.005551 -0.007053 0.083276 8.221857
summary(dec$FDR)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.5732 1.0000 1.0000
dec1 = dec
dec1$bio[which(dec1$bio < 1e-4)] = 1e-4
dec1$FDR[which(dec1$FDR < 1e-50)] = 1e-50
par(mfrow=c(2,2))
smart_hist(log10(dec1$bio), breaks=100, xlab="log10(bio)")
smart_hist(log10(dec1$FDR), breaks=100, main="", xlab="log10(FDR)")
plot(log10(dec1$bio), log10(dec1$FDR), xlab="log10(bio)", ylab="log10(FDR)",
pch=20, cex=0.5)
par(mfrow=c(1,1))FDR_thres = 1e-20
bio_thres = 0.1
FDR_keep = dec$FDR < FDR_thres
bio_keep = dec$bio > bio_thres
table(FDR_keep,bio_keep)## bio_keep
## FDR_keep FALSE TRUE
## FALSE 26064 4143
## TRUE 2705 4745
w2kp = which(dec1$FDR < FDR_thres & dec1$bio > bio_thres)
summary(rowData(sce)$n_cells_by_counts[w2kp])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 145 842 2223 3645 5444 15722
sce_hvg = sce[w2kp,]
sce_hvg## class: SingleCellExperiment
## dim: 4745 15858
## metadata(1): log.exprs.offset
## assays(2): counts logcounts
## rownames(4745): A2M-AS1 AADAT ... ZXDA ZXDB
## rowData names(12): gene chromosome ... total_counts
## log10_total_counts
## colnames(15858): F1S4_160106_001_B01 F1S4_160106_001_C01 ...
## F2S4_170405_060_F01 F2S4_170405_060_H01
## colData names(44): sample_name sample_id ...
## pct_counts_in_top_200_features pct_counts_in_top_500_features
## reducedDimNames(0):
## spikeNames(0):
Next we use the selected genes for PCA and use top 50 PCs for TSNE plot.
edat = t(as.matrix(logcounts(sce_hvg)))
edat = scale(edat)
dim(edat)## [1] 15858 4745
edat[1:2,1:3]## A2M-AS1 AADAT AAED1
## F1S4_160106_001_B01 -0.1914392 -0.5988271 -0.202752
## F1S4_160106_001_C01 -0.1914392 1.2214955 -0.202752
ppk = propack.svd(edat,neig=50)
pca = t(ppk$d*t(ppk$u))
dim(pca)## [1] 15858 50
df_hvg = smart_df(pca)
rownames(df_hvg) = NULL
names(df_hvg) = paste0("PC",seq(ncol(df_hvg)))
df_hvg = smart_df(sample_name = colnames(sce_hvg), df_hvg)
all_vars = c("log10_total_features_by_counts", "sex", "brain_hemisphere",
"brain_subregion", "facs_sort_criteria", "class", "cluster",
"cell_type")
all_vars %in% names(colData(sce_hvg))## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
df_hvg = cbind(df_hvg, colData(sce_hvg)[,all_vars])
dim(df_hvg)## [1] 15858 59
df_hvg[1:2,c(1:3,51:ncol(df_hvg))]## DataFrame with 2 rows and 12 columns
## sample_name PC1 PC2
## <character> <numeric> <numeric>
## 1 F1S4_160106_001_B01 1.38570005225832 -17.0606948426093
## 2 F1S4_160106_001_C01 -20.5795338696423 -1.14700228587648
## PC50 log10_total_features_by_counts sex
## <numeric> <numeric> <character>
## 1 -3.19594986256135 3.72065535655172 M
## 2 -2.07134103237301 3.91634865227546 M
## brain_hemisphere brain_subregion facs_sort_criteria class
## <character> <character> <character> <character>
## 1 L L5 NeuN-positive GABAergic
## 2 L L5 NeuN-positive Glutamatergic
## cluster cell_type
## <character> <character>
## 1 Inh L4-6 SST B3GAT2 Inh
## 2 Exc L5-6 RORB TTC12 Exc
set.seed(100)
date()## [1] "Sun Feb 17 22:12:36 2019"
tsne = Rtsne(pca, pca = FALSE)
date()## [1] "Sun Feb 17 22:15:05 2019"
df_tsne = smart_df(tsne$Y)
names(df_tsne) = paste0("HVG_TSNE",seq(ncol(tsne$Y)))
dim(df_tsne)## [1] 15858 2
df_tsne[1:2,]## HVG_TSNE1 HVG_TSNE2
## 1 -2.909237 -39.09050
## 2 -14.032734 45.77537
df_hvg = smart_df(df_hvg, df_tsne)
ggplot_custom(DATA = df_hvg,X = "HVG_TSNE1",Y = "HVG_TSNE2",COL = "log10_total_features_by_counts",TYPE = "cont")ggplot_custom(DATA = df_hvg,X = "HVG_TSNE1",Y = "HVG_TSNE2",COL = "sex",TYPE = "cat")ggplot_custom(DATA = df_hvg,X = "HVG_TSNE1",Y = "HVG_TSNE2",COL = "brain_hemisphere",TYPE = "cat")ggplot_custom(DATA = df_hvg,X = "HVG_TSNE1",Y = "HVG_TSNE2",COL = "brain_subregion",TYPE = "cat")ggplot_custom(DATA = df_hvg,X = "HVG_TSNE1",Y = "HVG_TSNE2",COL = "facs_sort_criteria",TYPE = "cat")ggplot_custom(DATA = df_hvg,X = "HVG_TSNE1",Y = "HVG_TSNE2",COL = "class",TYPE = "cat")ggplot_custom(DATA = df_hvg,X = "HVG_TSNE1",Y = "HVG_TSNE2",COL = "cluster",TYPE = "cat")ggplot_custom(DATA = df_hvg,X = "HVG_TSNE1",Y = "HVG_TSNE2",COL = "cell_type",TYPE = "cat")reducedDims(sce_hvg) = SimpleList(PCA=pca, TSNE=tsne$Y)
sce_hvg## class: SingleCellExperiment
## dim: 4745 15858
## metadata(1): log.exprs.offset
## assays(2): counts logcounts
## rownames(4745): A2M-AS1 AADAT ... ZXDA ZXDB
## rowData names(12): gene chromosome ... total_counts
## log10_total_counts
## colnames(15858): F1S4_160106_001_B01 F1S4_160106_001_C01 ...
## F2S4_170405_060_F01 F2S4_170405_060_H01
## colData names(44): sample_name sample_id ...
## pct_counts_in_top_200_features pct_counts_in_top_500_features
## reducedDimNames(2): PCA TSNE
## spikeNames(0):
saveRDS(list(sce_hvg=sce_hvg, df_hvg=df_hvg),
file.path(work_dir, "post_redDim_HVG.rds"))
rm(edat)
gc()## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 6581384 351.5 10052352 536.9 10052352 536.9
## Vcells 1665071672 12703.5 2656405646 20266.8 2656399828 20266.8
We first try clustering using kmeans.
# rds = readRDS("post_redDim_HVG.rds"); df_hvg = rds$df_hvg; sce_hvg = rds$sce_hvg; rm(rds)
set.seed(100)
all_num_clust = c(10:20)
df_hvg = df_hvg[,!grepl("^KM_",names(df_hvg))]
for(num_clust in all_num_clust){
# num_clust = 15
cat(paste0("KM with ",num_clust," clusters.\n"))
kmeans_out = kmeans(reducedDim(sce_hvg,"PCA"),
centers = num_clust,
iter.max = 1e3,
nstart = 50,
algorithm = "MacQueen")
print(kmeans_out[c("betweenss","tot.withinss")])
km_label = paste0("KM_",num_clust)
df_hvg[[km_label]] = as.factor(kmeans_out$cluster)
print(table(df_hvg$cell_type, kmeans_out$cluster))
print(ggplot_custom(DATA = df_hvg,X = "HVG_TSNE1",Y = "HVG_TSNE2",
COL = km_label,TYPE = "cat"))
}## KM with 10 clusters.
## $betweenss
## [1] 6057756
##
## $tot.withinss
## [1] 5427852
##
##
## 1 2 3 4 5 6 7 8 9 10
## Astro 0 0 0 0 0 1 0 287 0 0
## Endo 0 0 6 0 0 3 0 0 0 0
## Exc 3315 0 9 1 2829 502 0 12 2741 1064
## Inh 2 830 14 1145 0 882 1276 1 1 0
## Micro 0 0 58 0 0 4 0 1 0 0
## Oligo 0 0 311 0 0 0 0 2 0 0
## OPC 0 0 233 0 0 1 0 4 0 0
## unknown 58 2 19 4 70 60 15 24 63 8
## KM with 11 clusters.
## $betweenss
## [1] 6302689
##
## $tot.withinss
## [1] 5182919
##
##
## 1 2 3 4 5 6 7 8 9 10 11
## Astro 287 0 0 1 0 0 0 0 0 0 0
## Endo 0 0 0 8 0 0 0 0 1 0 0
## Exc 11 2587 0 228 757 2995 0 2276 9 1609 1
## Inh 1 1 837 717 0 1 1283 0 12 1 1298
## Micro 0 0 0 62 0 0 0 0 1 0 0
## Oligo 2 0 0 1 0 0 0 0 310 0 0
## OPC 4 0 0 233 0 0 0 0 1 0 0
## unknown 23 64 3 55 4 43 15 65 14 32 5
## KM with 12 clusters.
## $betweenss
## [1] 6583190
##
## $tot.withinss
## [1] 4902417
##
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## Astro 0 0 0 1 0 0 0 0 0 0 287 0
## Endo 0 0 1 7 0 1 0 0 0 0 0 0
## Exc 2543 2155 8 2 1061 135 1742 2815 0 1 11 0
## Inh 1 1 12 4 0 1098 1 0 848 903 1 1282
## Micro 0 0 1 62 0 0 0 0 0 0 0 0
## Oligo 0 0 310 1 0 0 0 0 0 0 2 0
## OPC 0 0 1 234 0 0 0 0 0 0 3 0
## unknown 54 62 14 13 8 39 14 74 3 5 22 15
## KM with 13 clusters.
## $betweenss
## [1] 6667206
##
## $tot.withinss
## [1] 4818402
##
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## Astro 0 0 0 0 0 0 1 0 0 287 0 0 0
## Endo 0 0 0 0 0 0 8 0 1 0 0 0 0
## Exc 1711 1674 2052 0 0 267 291 0 8 12 2014 2443 1
## Inh 2 0 1 823 1269 0 913 1 12 1 0 0 1129
## Micro 0 0 0 0 0 0 61 0 1 1 0 0 0
## Oligo 0 0 0 0 0 0 1 0 310 2 0 0 0
## OPC 0 0 0 0 0 0 0 234 1 3 0 0 0
## unknown 15 25 42 1 16 24 54 2 14 23 39 63 5
## KM with 14 clusters.
## $betweenss
## [1] 6824459
##
## $tot.withinss
## [1] 4661149
##
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## Astro 0 0 0 0 287 0 0 0 0 0 0 0 1
## Endo 0 0 0 0 0 0 0 0 1 0 0 0 8
## Exc 0 1 1752 1580 12 0 1671 0 8 1750 0 2043 3
## Inh 750 832 0 2 0 1246 1 1053 12 0 250 1 4
## Micro 0 0 0 0 0 0 0 0 1 0 0 0 62
## Oligo 0 0 0 0 2 0 0 0 310 0 0 0 1
## OPC 0 0 0 0 3 0 0 0 1 0 0 0 234
## unknown 10 4 26 14 22 16 94 13 14 28 0 37 17
##
## 14
## Astro 0
## Endo 0
## Exc 1653
## Inh 0
## Micro 0
## Oligo 0
## OPC 0
## unknown 28
## KM with 15 clusters.
## $betweenss
## [1] 7112286
##
## $tot.withinss
## [1] 4373322
##
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## Astro 0 0 0 0 0 0 0 0 0 0 0 0 0
## Endo 0 0 0 0 0 0 0 0 1 0 0 0 0
## Exc 0 0 1741 327 725 1639 1315 1 8 0 1493 1581 1629
## Inh 877 1051 0 0 0 0 1 832 12 1371 1 2 0
## Micro 0 0 0 0 0 0 0 0 1 0 0 0 0
## Oligo 0 0 0 0 0 0 0 0 310 0 0 0 0
## OPC 0 0 0 0 0 0 0 0 1 0 0 0 0
## unknown 3 11 35 7 4 28 28 4 14 24 87 14 25
##
## 14 15
## Astro 287 1
## Endo 0 8
## Exc 11 3
## Inh 1 3
## Micro 0 62
## Oligo 2 1
## OPC 3 234
## unknown 22 17
## KM with 16 clusters.
## $betweenss
## [1] 7233561
##
## $tot.withinss
## [1] 4252047
##
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## Astro 0 1 287 0 0 0 0 0 0 0 0 0 0
## Endo 0 8 0 0 0 0 0 1 0 0 0 0 0
## Exc 0 3 11 0 1516 1580 1 8 727 1306 1671 260 0
## Inh 940 4 1 1059 0 2 832 12 0 1 0 0 1299
## Micro 0 62 0 0 0 0 0 1 0 0 0 0 0
## Oligo 0 1 2 0 0 0 0 310 0 0 0 0 0
## OPC 0 234 3 0 0 0 0 1 0 0 0 0 0
## unknown 8 16 22 12 25 14 4 14 4 30 28 22 18
##
## 14 15 16
## Astro 0 0 0
## Endo 0 0 0
## Exc 1429 326 1635
## Inh 1 0 0
## Micro 0 0 0
## Oligo 0 0 0
## OPC 0 0 0
## unknown 75 7 24
## KM with 17 clusters.
## $betweenss
## [1] 7332195
##
## $tot.withinss
## [1] 4153413
##
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## Astro 0 0 0 287 0 0 0 0 1 0 0 0 0
## Endo 0 0 1 0 0 0 0 0 8 0 0 0 0
## Exc 0 1123 8 11 326 760 1332 1605 3 1 1324 0 1656
## Inh 1140 0 12 1 0 1 1 1 3 832 0 508 0
## Micro 0 0 1 0 0 0 0 0 62 0 0 0 0
## Oligo 0 0 310 2 0 0 0 0 1 0 0 0 0
## OPC 0 0 1 3 0 0 0 0 234 0 0 0 0
## unknown 14 19 14 22 7 20 82 28 16 4 23 12 32
##
## 14 15 16 17
## Astro 0 0 0 0
## Endo 0 0 0 0
## Exc 728 1596 0 0
## Inh 0 1 601 1050
## Micro 0 0 0 0
## Oligo 0 0 0 0
## OPC 0 0 0 0
## unknown 4 14 0 12
## KM with 18 clusters.
## $betweenss
## [1] 7517664
##
## $tot.withinss
## [1] 3967944
##
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## Astro 0 0 0 0 0 0 0 0 0 0 0 1 0
## Endo 0 0 0 1 1 0 0 0 0 0 0 7 0
## Exc 0 1591 1513 0 8 1307 728 0 260 2 326 0 1641
## Inh 607 2 0 1039 11 1 0 918 0 4 0 1 0
## Micro 0 0 0 0 0 0 0 0 0 62 0 0 0
## Oligo 0 0 0 0 310 0 0 0 0 1 0 0 0
## OPC 0 0 0 0 1 0 0 0 0 0 0 234 0
## unknown 0 14 24 15 12 31 4 11 22 9 7 7 24
##
## 14 15 16 17 18
## Astro 0 287 0 0 0
## Endo 0 0 0 0 0
## Exc 1429 11 1 0 1656
## Inh 1 1 833 733 0
## Micro 0 1 0 0 0
## Oligo 0 2 0 0 0
## OPC 0 3 0 0 0
## unknown 75 21 4 15 28
## KM with 19 clusters.
## $betweenss
## [1] 7539736
##
## $tot.withinss
## [1] 3945871
##
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## Astro 0 0 0 0 0 0 0 0 287 0 0 0 0
## Endo 0 0 0 0 0 0 0 0 0 0 1 0 1
## Exc 1614 1 326 1401 0 260 726 615 11 1325 12 1336 8
## Inh 0 761 1 0 579 0 0 0 1 1 707 0 12
## Micro 0 0 0 0 0 0 0 0 0 0 0 0 1
## Oligo 0 0 0 0 0 0 0 0 2 0 0 0 310
## OPC 0 0 0 0 0 0 0 0 3 0 0 0 1
## unknown 28 3 7 23 0 22 4 7 22 29 26 66 14
##
## 14 15 16 17 18 19
## Astro 0 0 1 0 0 0
## Endo 0 0 7 0 0 0
## Exc 1510 0 2 0 1326 0
## Inh 0 870 3 669 2 545
## Micro 0 0 62 0 0 0
## Oligo 0 0 1 0 0 0
## OPC 0 0 234 0 0 0
## unknown 25 7 10 13 13 4
## KM with 20 clusters.
## $betweenss
## [1] 7673430
##
## $tot.withinss
## [1] 3812178
##
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## Astro 0 0 0 0 287 0 0 0 0 0 0 0 0
## Endo 0 0 0 0 0 0 0 0 0 1 0 0 0
## Exc 260 0 763 0 11 1214 730 326 1152 8 1 1041 0
## Inh 0 757 1 871 1 1 0 0 0 11 818 1 1004
## Micro 0 0 0 0 1 0 0 0 0 0 0 0 0
## Oligo 0 0 0 0 2 0 0 0 0 310 0 0 0
## OPC 0 0 0 0 3 0 0 0 0 1 0 0 0
## unknown 22 1 21 7 21 70 4 7 20 12 4 9 11
##
## 14 15 16 17 18 19 20
## Astro 0 0 0 0 0 1 0
## Endo 0 0 0 0 0 7 1
## Exc 2 1426 1311 1036 1192 0 0
## Inh 4 0 0 1 0 1 680
## Micro 62 0 0 0 0 0 0
## Oligo 1 0 0 0 0 0 0
## OPC 0 0 0 0 0 234 0
## unknown 9 21 23 9 25 6 21
Next we try to clustering cell using SC3. Code used here is based on this link. Since SC3 is computationally much more expensive, we only try three number of clusters, 5, 10, and 15.
Next we compare the clustering results from SC3 and cell types.
It looks SC3 with 5 clusters separate GABAergic, Glutamatergic neurons, and non-neurons, but clusering results with 10 or 15 clusters are noisy. Here we at a bit more details for the results with 5 clusters.
Finally we save the sce object, sce_hvg object, and the clustering results.
saveRDS(sce, file.path(work_dir, "final_sce.rds"))
saveRDS(sce_hvg, file.path(work_dir, "final_sce_hvg.rds"))
saveRDS(df_hvg, file.path(work_dir, "final_hvg_clust.rds"))sessionInfo()## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Red Hat Enterprise Linux Server 7.6 (Maipo)
##
## Matrix products: default
## BLAS: /nas/longleaf/home/pllittle/downloads/R-3.5.1/lib64/R/lib/libRblas.so
## LAPACK: /nas/longleaf/home/pllittle/downloads/R-3.5.1/lib64/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] SC3_1.10.1 Rtsne_0.15
## [3] svd_0.4.1 data.table_1.12.0
## [5] limma_3.38.3 scran_1.10.2
## [7] scater_1.10.1 ggplot2_3.1.0
## [9] SingleCellExperiment_1.4.1 SummarizedExperiment_1.12.0
## [11] DelayedArray_0.8.0 BiocParallel_1.16.5
## [13] matrixStats_0.54.0 Biobase_2.42.0
## [15] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1
## [17] IRanges_2.16.0 S4Vectors_0.20.1
## [19] BiocGenerics_0.28.0 BiocInstaller_1.32.1
## [21] rmarkdown_1.11 markdown_0.9
## [23] knitr_1.21
##
## loaded via a namespace (and not attached):
## [1] ggbeeswarm_0.6.0 colorspace_1.4-0
## [3] class_7.3-15 dynamicTreeCut_1.63-1
## [5] XVector_0.22.0 BiocNeighbors_1.0.0
## [7] mvtnorm_1.0-8 codetools_0.2-16
## [9] doParallel_1.0.14 robustbase_0.93-3
## [11] cluster_2.0.7-1 pheatmap_1.0.12
## [13] shiny_1.2.0 HDF5Array_1.10.1
## [15] rrcov_1.4-7 compiler_3.5.1
## [17] assertthat_0.2.0 Matrix_1.2-15
## [19] lazyeval_0.2.1 later_0.7.5
## [21] htmltools_0.3.6 tools_3.5.1
## [23] bindrcpp_0.2.2 igraph_1.2.3
## [25] gtable_0.2.0 glue_1.3.0
## [27] GenomeInfoDbData_1.2.0 reshape2_1.4.3
## [29] dplyr_0.7.8 doRNG_1.7.1
## [31] Rcpp_1.0.0 gdata_2.18.0
## [33] iterators_1.0.10 DelayedMatrixStats_1.4.0
## [35] xfun_0.4 stringr_1.3.1
## [37] mime_0.6 irlba_2.3.3
## [39] rngtools_1.3.1 gtools_3.8.1
## [41] WriteXLS_4.0.0 statmod_1.4.30
## [43] edgeR_3.24.3 DEoptimR_1.0-8
## [45] zlibbioc_1.28.0 scales_1.0.0
## [47] promises_1.0.1 rhdf5_2.26.2
## [49] RColorBrewer_1.1-2 yaml_2.2.0
## [51] gridExtra_2.3 pkgmaker_0.27
## [53] stringi_1.2.4 pcaPP_1.9-73
## [55] foreach_1.4.4 e1071_1.7-0.1
## [57] caTools_1.17.1.1 bibtex_0.4.2
## [59] rlang_0.3.1 pkgconfig_2.0.2
## [61] bitops_1.0-6 evaluate_0.13
## [63] lattice_0.20-38 ROCR_1.0-7
## [65] purrr_0.3.0 Rhdf5lib_1.4.2
## [67] bindr_0.1.1 labeling_0.3
## [69] cowplot_0.9.4 tidyselect_0.2.5
## [71] plyr_1.8.4 magrittr_1.5
## [73] R6_2.3.0 gplots_3.0.1.1
## [75] pillar_1.3.1 withr_2.1.2
## [77] RCurl_1.95-4.11 tibble_2.0.1
## [79] crayon_1.3.4 KernSmooth_2.23-15
## [81] viridis_0.5.1 locfit_1.5-9.1
## [83] grid_3.5.1 digest_0.6.18
## [85] xtable_1.8-3 httpuv_1.4.5.1
## [87] munsell_0.5.0 beeswarm_0.2.3
## [89] registry_0.5 viridisLite_0.3.0
## [91] vipor_0.4.5
Lun, Aaron TL, Karsten Bach, and John C Marioni. 2016. “Pooling Across Cells to Normalize Single-Cell Rna Sequencing Data with Many Zero Counts.” Genome Biology 17 (1). BioMed Central: 75.